NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Model Residuals as Shields: A Two-Level Formulation to Defend Smart Grids From Poisoning Attacks

https://doi.org/10.1109/JIOT.2025.3575005

Lin, Tung-Wei; Roy, Padmaksha; Zeng, Yi; Jin, Ming; Jia, Ruoxi; Liu, Chen-Ching; Sangiovanni-Vincentelli, Alberto (August 2025, IEEE Internet of Things Journal)

Full Text Available
Synergistic Density Functional Theory and Molecular Dynamics Approach to Elucidate PNIPAM–Water Interaction Mechanisms

https://doi.org/10.3390/ma18112498

Alomari, Noor; Aparicio, Santiago; Meyer, Paul; Zeng, Yi; Cui, Shuang; Gutierrez, Alberto; Atihan, Mert (May 2025, Materials)

This study employs Density Functional Theory (DFT) and Molecular Dynamics (MD) simulations to investigate interactions between water molecules and Poly(Nisopropylacrylamide) (PNIPAM). DFT reveals preferential water binding sites, with enhanced binding energy observed in the linker zone. Quantum Theory of Atoms in Molecules (QTAIM) and electron localization function (ELF) analyses highlight the roles of hydrogen bonding and steric hindrance. MD simulations unveil temperature-dependent hydration dynamics, with structural transitions marked by changes in the radius of gyration (Rg) and the radial distribution function (RDF), aligning with DFT findings. Our work goes beyond prior studies by combining a DFT, QTAIM and MD simulations approach across different PNIPAM monomer-to-30mer structures. It introduces a systematic quantification of pseudo-saturation thresholds and explores water clustering dynamics with structural specificity, which have not been previously reported in the literature. These novel insights establish a more complete molecular-level picture of PNIPAM hydration behavior and temperature responsiveness, emphasizing the importance of amide hydrogen and carbonyl oxygen sites in hydrogen bonding, which weakens above the lower critical solution temperature (LCST), resulting in increased hydrophobicity and paving the way for understanding water sorption mechanisms, offering guidance for future applications such as dehumidification and atmospheric water harvesting.
more » « less
Full Text Available
Evolutionary insights into elongation factor G using AlphaFold and ancestral analysis

https://doi.org/10.1016/j.compbiomed.2025.110188

Rahaman, Shawonur; Steele, Jacob H; Zeng, Yi; Xu, Shoujun; Wang, Yuhong (June 2025, Computers in Biology and Medicine)

Full Text Available
Air-bench 2024: A safety benchmark based on regulation and policies specified risk categories

Zeng, Yi; Yang, Yu; Zhou, Andy; Tan, Jeffrey Ziwei; Tu, Yuheng; Mai, Yifan; Klyman, Kevin; Pan, Minzhou; Jia, Ruoxi; Song, Dawn (April 2025, 13th International Conference on Learning Representations (ICLR 2025))

Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in recent regulations and policies, which makes it challenging to evaluate and compare FMs across these benchmarks. To bridge this gap, we introduce AIR-BENCH 2024, the first AI safety benchmark for language models aligned with emerging government regulations and company policies, following the regulation-based safety categories grounded in the AI risks taxonomy, AIR 2024. AIR 2024 decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with 314 granular risk categories in the lowest tier. AIR-BENCH 2024 contains 5,694 diverse prompts spanning these categories, with manual curation and human auditing to ensure quality. We evaluate leading language models on AIR-BENCH 2024, uncovering insights into their alignment with specified safety concerns. By bridging the gap between public benchmarks and practical AI risks, AIR-BENCH 2024 provides a foundation for assessing model safety across jurisdictions, fostering the development of safer and more responsible AI systems.
more » « less
Full Text Available
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data

Hu, Mengxuan; Guan, Zihan; Zeng, Yi; Guo, Junfeng; Zhou, Zhongliang; Zhang, Jielu; Jia, Ruoxi; Vullikanti, Anil; Li, Sheng (January 2025, International Conference on Learning Representations (ICLR))

Full Text Available
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

https://doi.org/10.18653/v1/2024.emnlp-main.732

Zeng, Yi; Sun, Weiyu; Huynh, Tran; Song, Dawn; Li, Bo; Jia, Ruoxi (November 2024, Association for Computational Linguistics)

Full Text Available
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data

Hu, Mengxuan; Guan, Zihan; Zeng, Yi; Guo, Junfeng; Zhou, Zhongliang; Zhang, Jielu; Jia, Ruoxi; Vullikanti, Anil Kumar; Li, Sheng (January 2025, International Conference on Learning Representations)

Anti-backdoor learning, aiming to train clean models directly from poisoned datasets, serves as an important defense method for backdoor attack. However, existing methods usually fail to recover backdoored samples to their original, correct labels and suffer from poor generalization to large pre-trained models due to its non end-to end training, making them unsuitable for protecting the increasingly prevalent large pre-trained models. To bridge the gap, we first revisit the anti-backdoor learning problem from a causal perspective. Our theoretical causal analysis reveals that incorporating both images and the associated attack indicators preserves the model's integrity. Building on the theoretical analysis, we introduce an end-to-end method, Mind Control through Causal Inference (MCCI), to train clean models directly from poisoned datasets. This approach leverages both the image and the attack indicator to train the model. Based on this training paradigm, the model’s perception of whether an input is clean or backdoored can be controlled. Typically, by introducing fake non-attack indicators, the model perceives all inputs as clean and makes correct predictions, even for poisoned samples. Extensive experiments demonstrate that our method achieves state-of-the-art performance, efficiently recovering the original correct predictions for poisoned samples and enhancing accuracy on clean samples.
more » « less
Full Text Available
RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Guo, Chengquan; Liu, Xun; Xie, Chulin; Zhou, Andy; Zeng, Yi; Lin, Zinan; Song, Dawn; Li, Bo (December 2024, Proceedings of the the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS))

With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding and software development, safety and security concerns, such as generating or executing malicious code, have become significant barriers to the real-world deployment of these agents. To provide comprehensive and practical evaluations on the safety of code agents, we propose RedCode, an evaluation platform with benchmarks grounded in four key principles: real interaction with systems, holistic evaluation of unsafe code generation and execution, diverse input formats, and high-quality safety scenarios and tests. RedCode consists of two parts to evaluate agents’ safety in unsafe code execution and generation: (1) RedCode-Exec provides challenging code prompts in Python as inputs, aiming to evaluate code agents’ ability to recognize and handle unsafe code. We then map the Python code to other programming languages (e.g., Bash) and natural text summaries or descriptions for evaluation, leading to a total of over 4,000 testing instances. We provide 25 types of critical vulnerabilities spanning various domains, such as websites, file systems, and operating systems. We provide a Docker sandbox environment to evaluate the execution capabilities of code agents and design corresponding evaluation metrics to assess their execution results. (2) RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions to generate harmful code or software. Our empirical findings, derived from evaluating three agent frameworks based on 19 LLMs, provide insights into code agents’ vulnerabilities. For instance, evaluations on RedCode-Exec show that agents are more likely to reject executing unsafe operations on the operating system, but are less likely to reject executing technically buggy code, indicating high risks. Unsafe operations described in natural text lead to a lower rejection rate than those in code format. Additionally, evaluations on RedCode-Gen reveal that more capable base models and agents with stronger overall coding abilities, such as GPT4, tend to produce more sophisticated and effective harmful software. Our findings highlight the need for stringent safety evaluations for diverse code agents. Our dataset and code are publicly available at https://github.com/AI-secure/RedCode.
more » « less
Full Text Available
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories

Zeng, Yi; Yang, Yu; Zhou, Andy; Tan, Jeffrey; Tu, Yuheng; Mai, Yifan; Klyman, Kevin; Pan, Minzhou; Jia, Ruoxi; Song, Dawn; et al (January 2025, International Conference on Learning Representations (ICLR))

Full Text Available
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Yuan, Zhuowen; Xiong, Zidi; Zeng, Yi; Yu, Ning; Jia, Ruoxi; Song, Dawn; Li, Bo (July 2024, International Conference on Machine Learning (ICML 2024))

Recent advancements in Large Language Models (LLMs) have showcased remarkable capabilities across various tasks in different domains. However, the emergence of biases and the potential for generating harmful content in LLMs, particularly under malicious inputs, pose significant challenges. Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently and effectively moderate harmful and unsafe inputs and outputs for LLMs. By employing a multi-faceted approach that includes energy-based training data augmentation through Langevin dynamics, optimizing a safe suffix for inputs via minimax optimization, and integrating a fusion-based model combining robust KNN with LLMs based on our data augmentation, RigorLLM offers a robust solution to harmful content moderation. Our experimental evaluations demonstrate that RigorLLM not only outperforms existing baselines like OpenAI API and Perspective API in detecting harmful content but also exhibits unparalleled resilience to jailbreaking attacks. The innovative use of constrained optimization and a fusion-based guardrail approach represents a significant step forward in developing more secure and reliable LLMs, setting a new standard for content moderation frameworks in the face of evolving digital threats.
more » « less
Full Text Available

« Prev Next »

Search for: All records